Back

BMC Genomics

15 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
A large deletion spanning multiple enhancers near PITX2 increases primary open-angle glaucoma risk
2026-03-02 ophthalmology 10.64898/2026.02.26.25342774
#1 (1.6%)
Show abstract

ImportanceGenome-wide association studies have identified hundreds of common single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) associated with primary open-angle glaucoma (POAG) risk, though these variants have modest effect sizes and individually may have minor contributions to disease development. As whole-genome sequencing data is becoming more readily available, structural variants and other complex genomic features can be interrogated for contribution to disease...

2
Rare Coding Variant Associations With Primary Open-Angle Glaucoma In African Ancestry:A Multi-Cohort Exome-Wide Meta Analysis
2026-02-27 ophthalmology 10.64898/2026.02.25.26347141
#1 (1.5%)
Show abstract

Primary open-angle glaucoma (POAG) disproportionately affects individuals of African ancestry, yet rare coding variation in this population remains understudied. To address this gap, we performed a multi-cohort exome-wide meta-analysis across POAAGG, PMBB, All of Us, and UK Biobank, including 4,815 POAG cases and 22,922 controls of genetically inferred African ancestry. Although no gene reached exome-wide significance, we identified several suggestive gene-level associations driven by rare varia...

3
Advancing Legionella pneumophila genomic surveillance with a high-resolution cg/wgMLST schema for outbreak detection and investigation
2026-02-19 public and global health 10.64898/2026.02.18.26346554
Top 0.2% (1.1%)
Show abstract

IntroductionSequence-based typing (SBT) has been the standard molecular typing method for understanding Legionella pneumophila genetic relationships. However, genome-scale typing approaches, namely core-genome (cg) or whole-genome (wg) multilocus sequence typing (MLST), provide higher discriminatory power. To advance these capabilities, the Legionella International Typing (LIT) workgroup was established to develop, evaluate, and disseminate a novel cgMLST schema with enhanced wgMLST resolution f...

4
Integrated monogenic and polygenic risk predicts disease progression in Fuchs endothelial corneal dystrophy
2026-02-18 genetic and genomic medicine 10.64898/2026.02.17.26346339
Top 0.2% (1.1%)
Show abstract

PurposeFuchs endothelial corneal dystrophy (FECD) is a common corneal disease and a leading indication for endothelial keratoplasty (EK). Although CTG18.1 repeat expansion is a major genetic risk factor, the contribution of polygenic background to disease progression remains unclear. We evaluated whether combining CTG18.1 expansion status with a FECD-specific polygenic risk score (PRS) enables genomic prediction of progression to EK. MethodsWe retrospectively analysed 589 individuals with FECD ...

5
Benchmarking HLA genotyping from whole-genome sequencing across multiple sequencing technologies
2026-02-12 health informatics 10.64898/2026.02.10.26345621
Top 0.4% (0.7%)
Show abstract

BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illum...

6
Household Transmission of Enterovirus D68 in Washington and Oregon, USA, 2022-2024
2026-02-22 infectious diseases 10.64898/2026.02.16.26346322
Top 0.6% (0.7%)
Show abstract

Household transmission of EV-D68 was identified in 35 of 1040 households (3.4%) in the Pacific Northwest between 2022-2024, with an estimated secondary attack rate of 15%. Sequences from within households clustered closely with 0 to 2 pairwise nucleotide differences (median 1) between cases 6-14 days apart (median 7).

7
The Representativeness of Regional Influenza Virus Genomic Surveillance for National Trends in the United States
2026-03-02 infectious diseases 10.64898/2026.02.23.26346422
Top 0.6% (0.7%)
Show abstract

Genomic surveillance of influenza viruses informs vaccine strain selection and evolutionary forecasting. Sequencing efforts vary widely across U.S. states, which raises concerns about spatial sampling bias. We evaluated how well 10,958 influenza virus genomes sampled by our group in Michigan captured the genetic diversity in 34,743 genomes circulating nationally from the 2021/22 through 2024/25 seasons. We defined seasonal hemagglutinin haplotypes and tracked their detection across states. A sma...

8
Genome-wide association study of corneal dystrophy uncovers novel risk loci and enables improved polygenic prediction of Fuchs endothelial corneal dystrophy
2026-02-15 genetic and genomic medicine 10.64898/2026.02.10.26345409
Top 0.7% (0.5%)
Show abstract

ObjectiveTo identify risk loci for Fuchs endothelial corneal dystrophy (FECD) and improve a genetic risk prediction model. DesignGenome-wide association study (GWAS), polygenic risk score (PRS) construction, and TCF4 CTG18.1 short tandem repeat (STR) length inference. ParticipantsThe study included 7,316 Europeans (EUR) with FECD or related corneal dystrophy phenotypes and 1,588,467 controls from the UK Biobank, All of Us, FinnGen, and the Million Veteran Program. Two independent EUR FECD coho...

9
Distinguishing causal from tagging enhancers using single-cell multiome data
2026-02-17 genetic and genomic medicine 10.64898/2026.02.15.26346353
Top 0.7% (0.5%)
Show abstract

Methods that analyze single-cell RNA-seq+ATAC-seq multiome data have shown promise in linking enhancers to target genes by correlating chromatin accessibility with gene expression across cells. However, correlations among ATAC-seq peaks may induce non-causal tagging peak-gene links (analogous to tagging associations in GWAS); indeed, we confirm that tagging effects induced by peak co-accessibility are pervasive in peak-gene linking. We defined two scores for each ATAC-seq peak: co-accessibility ...

10
An Integrated Deep Learning Framework for Small-Sample Biomedical Data Classification: Explainable Graph Neural Networks with Data Augmentation for RNA sequencing Dataset
2026-02-24 genetic and genomic medicine 10.64898/2026.02.22.26346827
Top 0.8% (0.5%)
Show abstract

Applying deep learning models to RNA-Seq data poses substantial challenges, primarily due to the high dimensionality of the data and the limited sample sizes. To address these issues, this study introduces an advanced deep learning pipeline that integrates feature engineering with data augmentation. The engineering application focuses on biomedical engineering, specifically the classification of RNA-Seq datasets for disease diagnosis. The proposed framework was initially validated on synthetic d...

11
PHARMWATCH: A Multilayer Pharmacogenomics Safety System for Accurate Star Allele Interpretation
2026-02-28 genetic and genomic medicine 10.64898/2026.02.26.26347200
Top 1% (0.4%)
Show abstract

The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of ben...

12
FA-NIVA: A Nextflow framework for automated analysis of Nanopore based long-read sequencing data for genetic analysis in Fanconi anemia
2026-03-04 genetic and genomic medicine 10.64898/2026.02.27.26346867
Top 1% (0.4%)
Show abstract

MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis p...

13
Characterization of the somatic landscape and transcriptional profile of breast tumors from 748 Hispanic/Latina women in California
2026-02-17 genetic and genomic medicine 10.64898/2026.02.13.26346286
Top 1% (0.4%)
Show abstract

Somatic mutations and the tumor immune microenvironment in breast tumors are important predictors of treatment response and survival, yet data for Hispanic/Latina (H/L) women are limited. Here we analyzed whole exome sequencing data from tumor/normal pairs and RNAseq data from 748 H/L women and 388 non-Hispanic White (NHW) women. Overall, the somatic profiles in tumors from H/L women were similar to NHW women. However, somatic mutations in genome organizer CTCF were significantly more common in ...

14
Cancer genomic profiling predicts pathogenicity of BRCA1 and BRCA2 variants
2026-03-06 genetic and genomic medicine 10.64898/2026.03.05.26347746
Top 1% (0.4%)
Show abstract

Accurate classification of BRCA1 and BRCA2 variants is essential for cancer risk assessment and therapy selection, yet over one-third remain variants of uncertain significance (VUS). Here, using 120,660 real-world cancer genomic profiles with BRCA1 or BRCA2 variants from a >800,000-sample cohort, we develop machine learning models that predict pathogenicity using clinical and tumor-derived features, including a pan-cancer homologous recombination deficiency signature, co-mutated genes, zygosity,...

15
Constructing a Literature-Derived Database for Benchmarking Polygenic Risk Score Construction Methods with Spectral Ranking Inferences
2026-03-03 genetic and genomic medicine 10.64898/2026.03.01.26347258
Top 1% (0.4%)
Show abstract

Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenoty...

16
Monogenic Syndromes as a Cause of Adverse Drug Reactions in the Russian Population
2026-02-17 genetic and genomic medicine 10.64898/2026.02.13.26346297
Top 1% (0.4%)
Show abstract

IntroductionAdverse drug reactions (ADRs) remain a major public health issue, and genetic factors contribute importantly to interindividual variability in drug response. Pharmacogenetic testing helps reduce ADR risk by optimizing drug selection and dosage, particularly in monogenic disorders. Material and MethodsWhole-exome sequencing of 6,739 samples from the Russian population was performed using the MGIEasy Universal DNA Library Prep Set on the DNBSEQ-G400 platform (MGI). Variants in 48 gene...

17
Molecular characterisation of a Klebsiella pneumoniae neonatal sepsis outbreak in a rural Gambian hospital: a retrospective genomic epidemiology investigation
2026-03-04 genetic and genomic medicine 10.64898/2026.03.03.26347025
Top 2% (0.4%)
Show abstract

BackgroundKlebsiella pneumoniae is a common cause of neonatal sepsis in Africa, and is frequently hospital acquired. We recently reported an outbreak of multidrug-resistant K. pneumoniae sepsis amongst neonates at a rural hospital in The Gambia, West Africa, involving 57 cases and case fatality of 60%. Here we undertook a retrospective pathogen genomic epidemiology study of clinical and environmental K. pneumoniae isolated during the outbreak, to identify the outbreak strain, refine the epidemic...

18
The landscape of structural variants in male infertility identified by optical genome mapping
2026-03-02 genetic and genomic medicine 10.64898/2026.02.27.26347236
Top 2% (0.4%)
Show abstract

STUDY QUESTION[Do structural genomic variants, that can be identified by using optical genome mapping, contribute to male infertility?] SUMMARY ANSWER[By using optical genome mapping we can identify several types of structural variants, both known and new, that may contribute to male infertility.] WHAT IS KNOWN ALREADY[Traditional approaches such as karyotyping, CFTR and chromosome Y microdeletion testing are successful in explaining clinical findings in [~]30% of MI patients, leaving the rest...

19
Pharmacogenomic Variants in the Russian Population: A Retrospective Analysis of 6102 Exomes
2026-02-17 genetic and genomic medicine 10.64898/2026.02.16.26346289
Top 2% (0.4%)
Show abstract

BackgroundPersonalized pharmacotherapy requires systematic consideration of genetic factors influencing drug efficacy and safety. The accumulation of large-scale whole-exome sequencing (WES) data provides an opportunity to assess population frequencies of clinically significant pharmacogenetic variants; however, the diagnostic applicability of exome data for pharmacogenomics remains insufficiently studied. Materials and MethodsA retrospective analysis of 6,102 anonymized sequencing datasets obt...

20
Features Influencing Diagnostic Yield of Exome Sequencing in the DECIPHERD Study in Chile
2026-02-22 genetic and genomic medicine 10.64898/2026.02.12.26345769
Top 2% (0.3%)
Show abstract

BackgroundExome sequencing (ES) has become a key diagnostic tool for rare diseases (RDs). However, most evidence on ES performance comes from high-income countries and patients from European ancestry. In countries such as Chile, limited access to next generation sequencing amplifies health disparities and highlights the need to identify which patients are most likely to benefit from ES. MethodsThis study presents the second phase of the Chilean DECIPHERD project, in which we performed ES in a n...